Skip to content

LST CPU Speedups#245

Open
GNiendorf wants to merge 2 commits intomasterfrom
cpu_speedups_hoist
Open

LST CPU Speedups#245
GNiendorf wants to merge 2 commits intomasterfrom
cpu_speedups_hoist

Conversation

@GNiendorf
Copy link
Member

@GNiendorf GNiendorf commented Mar 18, 2026

This PR Timing (CPU) - commit 2 (pre-checks, exact trig simplifications, and additional early exits)
Screenshot 2026-03-20 at 2 08 12 AM
This PR Timing (CPU) - commit 1 (reducing redundant memory loads)
Screenshot 2026-03-19 at 9 13 29 PM
Master Timing (CPU)
Screenshot 2026-03-09 at 11 07 11 PM

@GNiendorf GNiendorf force-pushed the cpu_speedups_hoist branch from f5fbe61 to 83f2297 Compare March 18, 2026 17:30
@GNiendorf
Copy link
Member Author

run-ci: all

@GNiendorf GNiendorf marked this pull request as ready for review March 18, 2026 17:37
@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     29.0    323.1    245.5    138.1     48.6    695.7     10.9    116.6    119.7    208.9      0.1    1936.1    1211.5+/- 290.1     602.5   explicit[s=4] (target branch)
   avg     28.1    218.8    178.6    127.2     49.7    700.7     10.6    109.5     83.0    202.6      0.1    1708.9     980.1+/- 239.1     545.5   explicit[s=4] (this PR)

@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf
Copy link
Member Author

run-ci: all
modifiers: gpu

@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on GPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     31.1      0.2      0.4      0.6      0.9      0.3      0.6      0.5      0.3      1.4      0.0      36.2       4.8+/-  2.7      36.2   explicit[s=1]
   avg      1.1      0.3      0.5      0.8      1.0      0.3      0.8      0.7      0.4      1.8      0.0       7.7       6.3+/-  2.8       4.0   explicit[s=2]
   avg      2.0      0.6      0.8      1.2      1.5      0.4      1.2      1.0      0.6      2.8      0.0      12.1       9.7+/-  3.5       3.2   explicit[s=4]
   avg      3.2      0.9      1.2      1.7      2.0      0.5      1.7      1.3      0.8      3.9      0.0      17.2      13.5+/-  4.3       3.0   explicit[s=6]
   avg      3.7      1.3      1.7      2.4      2.6      0.7      2.3      1.6      1.0      4.9      0.0      22.3      17.9+/-  4.6       2.9   explicit[s=8] (target branch)
   avg     31.1      0.2      0.4      0.6      0.9      0.3      0.6      0.5      0.3      1.4      0.0      36.2       4.8+/-  2.6      36.3   explicit[s=1]
   avg      1.3      0.3      0.5      0.7      1.0      0.3      0.8      0.7      0.4      1.8      0.0       7.9       6.4+/-  2.8       4.1   explicit[s=2]
   avg      2.2      0.6      0.8      1.2      1.5      0.4      1.2      1.0      0.6      2.8      0.0      12.2       9.6+/-  3.3       3.2   explicit[s=4]
   avg      3.0      0.9      1.2      1.7      2.1      0.5      1.7      1.3      0.8      3.8      0.0      17.0      13.5+/-  4.1       3.0   explicit[s=6]
   avg      3.6      1.3      1.7      2.3      2.6      0.7      2.2      1.7      1.0      5.0      0.0      22.2      18.0+/-  4.5       2.9   explicit[s=8] (this PR)

@GNiendorf
Copy link
Member Author

@slava77 I think this PR is good to go. Represents most of the boiler-plate changes of the CPU speedups PR.

@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on GPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77
Copy link

slava77 commented Mar 19, 2026

image

the GPU variant should have one more significant digit in the component columns (the total can be still with .1.
I don't have a particluar preference for this PR or separate.

Copy link

@slava77 slava77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice updates.
I think the comment cleanup in the MiniDoublet code is a bit too aggressive. While some removals may be clean for some tautological docs, quite a bit is going to lose clarity. Please recover

@GNiendorf GNiendorf force-pushed the cpu_speedups_hoist branch 2 times, most recently from 9aad224 to 727bac8 Compare March 19, 2026 20:50
@GNiendorf
Copy link
Member Author

run-ci: all

@GNiendorf GNiendorf force-pushed the cpu_speedups_hoist branch from 727bac8 to 2375562 Compare March 19, 2026 20:59
@GNiendorf
Copy link
Member Author

run-ci: all

@GNiendorf
Copy link
Member Author

run-ci: all
modifiers: gpu

@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     28.2    324.2    243.0    136.6     47.8    698.4     10.9    114.7    118.8    208.7      0.1    1931.4    1204.8+/- 289.9     596.7   explicit[s=4] (target branch)
   avg     31.1    219.2    182.6    133.5     47.6    698.9     10.8    110.7     83.1    185.9      0.1    1703.5     973.4+/- 227.2     541.9   explicit[s=4] (this PR)

@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on GPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on GPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     32.5      0.2      0.4      0.6      0.9      0.3      0.6      0.5      0.3      1.4      0.0      37.6       4.8+/-  2.6      37.6   explicit[s=1]
   avg      1.1      0.4      0.5      0.8      1.0      0.3      0.8      0.7      0.4      1.8      0.0       7.8       6.4+/-  2.9       4.0   explicit[s=2]
   avg      1.8      0.6      0.8      1.1      1.5      0.4      1.2      1.0      0.6      2.8      0.0      11.9       9.7+/-  3.5       3.1   explicit[s=4]
   avg      2.6      0.9      1.3      1.7      2.0      0.5      1.7      1.2      0.8      3.9      0.0      16.6      13.5+/-  4.1       2.9   explicit[s=6]
   avg      3.4      1.3      1.7      2.3      2.6      0.7      2.3      1.6      1.0      5.0      0.0      21.9      17.8+/-  4.5       2.8   explicit[s=8] (target branch)
   avg     32.6      0.2      0.4      0.6      0.8      0.3      0.6      0.5      0.3      1.4      0.0      37.7       4.8+/-  2.5      37.7   explicit[s=1]
   avg      1.2      0.4      0.5      0.8      1.0      0.3      0.8      0.8      0.4      1.9      0.0       7.9       6.5+/-  2.8       4.1   explicit[s=2]
   avg      1.8      0.6      0.8      1.1      1.5      0.4      1.2      1.0      0.6      2.8      0.0      11.8       9.6+/-  3.5       3.1   explicit[s=4]
   avg      2.6      1.0      1.2      1.7      2.0      0.5      1.7      1.3      0.8      4.0      0.0      16.7      13.6+/-  4.0       2.9   explicit[s=6]
   avg      3.4      1.3      1.7      2.2      2.6      0.7      2.2      1.7      1.0      5.0      0.0      21.9      17.8+/-  4.5       2.8   explicit[s=8] (this PR)

@GNiendorf
Copy link
Member Author

run-ci: all

@GNiendorf GNiendorf force-pushed the cpu_speedups_hoist branch from a2305fb to df0990a Compare March 19, 2026 23:24
@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     27.7    322.4    240.8    132.7     45.6    696.8     10.6    114.4    115.1    208.0      0.2    1914.4    1189.9+/- 285.9     594.9   explicit[s=4] (target branch)
   avg     30.6    102.9    179.8    126.4     48.6    681.7     10.6     39.0     67.1    209.9      0.1    1496.7     784.5+/- 195.0     491.4   explicit[s=4] (this PR)

@GNiendorf GNiendorf changed the title Remove Redundant Memory Loads CPU Optimizations Mar 19, 2026
@GNiendorf GNiendorf changed the title CPU Optimizations LST CPU Speedups Mar 19, 2026
@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf GNiendorf force-pushed the cpu_speedups_hoist branch from df0990a to 6ba26d8 Compare March 20, 2026 01:17
@GNiendorf
Copy link
Member Author

run-ci: all

@GNiendorf
Copy link
Member Author

run-ci: all
modifiers: gpu

@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     27.8    323.7    246.6    133.8     45.9    694.1     10.6    113.1    116.7    208.9      0.8    1922.1    1200.1+/- 288.6     593.4   explicit[s=4] (target branch)
   avg     28.9    104.0    137.1    117.7     50.5    685.8     10.4     38.1     68.6    188.3      0.1    1429.6     715.0+/- 182.9     472.5   explicit[s=4] (this PR)

@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on GPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     31.7      0.2      0.4      0.5      0.8      0.3      0.6      0.4      0.3      1.3      0.0      36.6       4.7+/-  2.4      36.6   explicit[s=1]
   avg      1.1      0.3      0.5      0.7      1.0      0.3      0.8      0.5      0.4      1.8      0.0       7.5       6.2+/-  2.9       3.9   explicit[s=2]
   avg      1.9      0.5      0.9      1.1      1.5      0.4      1.2      0.8      0.6      2.9      0.0      11.7       9.5+/-  3.5       3.0   explicit[s=4]
   avg      2.6      0.9      1.3      1.6      2.1      0.5      1.8      1.0      0.9      3.9      0.0      16.6      13.5+/-  4.0       2.9   explicit[s=6]
   avg      3.3      1.4      1.8      2.3      2.6      0.6      2.3      1.3      1.1      5.0      0.0      21.8      17.8+/-  4.6       2.8   explicit[s=8] (target branch)
   avg     32.2      0.2      0.4      0.5      0.8      0.3      0.6      0.4      0.3      1.3      0.0      37.2       4.7+/-  2.5      37.2   explicit[s=1]
   avg      1.0      0.3      0.5      0.7      1.0      0.3      0.8      0.5      0.4      1.8      0.0       7.4       6.1+/-  2.8       3.8   explicit[s=2]
   avg      1.8      0.6      0.9      1.1      1.5      0.4      1.2      0.8      0.6      2.9      0.0      11.7       9.4+/-  3.7       3.0   explicit[s=4]
   avg      2.6      0.9      1.3      1.6      2.0      0.5      1.7      1.0      0.8      3.9      0.0      16.5      13.4+/-  3.9       2.9   explicit[s=6]
   avg      3.3      1.4      1.7      2.4      2.7      0.7      2.4      1.3      1.1      5.0      0.0      22.1      18.1+/-  4.2       2.8   explicit[s=8] (this PR)

@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on GPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants